Dictionary refinements based on phonetic consensus and non-uniform pronunciation reduction
نویسندگان
چکیده
In this paper we present a procedure to refine the recognition dictionary based on a composite approach to prune the unneeded pronunciations. First, pruning is applied in a non-uniform manner according to the characteristics of each word. Even though this straightforward operation may produce high-quality dictionaries, it makes the refined dictionary heavily dependent on the data used in this process. For the words not observed in the data, we propose, in second place, to use multiple sequence alignment techniques in order to find phonetic consensus among the pronunciation variants and select the worthy pronunciations that will represent the unobserved words. Experimental results show that our dictionary refining method helps to improve the recognition performance in two relevant aspects: it increases the recognition accuracy by reducing the cross-word confusibility and it improves the recognition speed by reducing the complexity of the search space.
منابع مشابه
Wiktionary as a source for automatic pronunciation extraction
In this paper, we analyze whether dictionaries from the World Wide Web which contain phonetic notations, may support the rapid creation of pronunciation dictionaries within the speech recognition and speech synthesis system building process. As a representative dictionary, we selected Wiktionary [1] since it is at hand in multiple languages and, in addition to the definitions of the words, many...
متن کاملInferring Hierarchical Pronunciation Rules from a Phonetic Dictionary
This work presents a new phonetic transcription system based on a tree of hierarchical pronunciation rules expressed as context-specific grapheme-phoneme correspondences. The tree is automatically inferred from a phonetic dictionary by incrementally analyzing deeper context levels, eventually representing a minimum set of exhaustive rules that pronounce without errors all the words in the train...
متن کاملLarge vocabulary continuous speech recognition based on cross-morpheme phonetic information
In this paper, we present a novel method to regulate lexical connections among morpheme-based pronunciation lexicons for Korean large vocabulary continuous speech recognition (LVCSR) systems. A pronunciation dictionary plays an important role in subword-based LVCSR in that pronunciation variations such as coarticulation will deteriorate the performance of an LVCSR system if it is not well accou...
متن کاملHMM-based Pronunciation Dictionary Generation
In this paper, we discuss automatically generating a phonetic pronunciation from an orthographic spelling of words. The letter-sequence to phoneme-sequence mapping is useful in a variety of contexts, including text-to-speech applications, automatic spelling correction, and generating a pronunciation lexicon for a new training dataset which contains out-of-vocabulary words. A system based on hid...
متن کاملThe Effect of Using Phonetic Websites on Iranian EFL Learners’ Word Level Pronunciation
Computer-assisted language learning (CALL) is reaching an up most position in the pedagogical field of English as a Second or Foreign Language (ESL/EFL). The present study was carried out to study the effect of using phonetic websites on Iranian EFL students’ pronunciation and knowledge of phonemic symbols. Participants of the study included 30 EFL female pre-intermediate students studyin...
متن کامل